NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Federated Learning with Spiking Neural Networks in Heterogeneous Systems

https://doi.org/10.1109/ISVLSI59464.2023.10238618

Tumpa, Sadia Anjum; Singh, Sonali; Khan, Md_Fahim Faysal; Kandemir, Mahmut Tylan; Narayanan, Vijaykrishnan; Das, Chita R (June 2023, IEEE)
Stash: A Comprehensive Stall-Centric Characterization of Public Cloud VMs for Distributed Deep Learning

https://doi.org/10.1109/ICDCS57875.2023.00023

Sharma, Aakash; Bhasi, Vivek M; Singh, Sonali; Jain, Rishabh; Gunasekaran, Jashwant Raj; Mitra, Subrata; Kandemir, Mahmut Taylan; Kesidis, George; Das, Chita R (July 2023, IEEE)

Deep neural networks (DNNs) are increasingly popular owing to their ability to solve complex problems such as image recognition, autonomous driving, and natural language processing. Their growing complexity coupled with the use of larger volumes of training data (to achieve acceptable accuracy) has warranted the use of GPUs and other accelerators. Such accelerators are typically expensive, with users having to pay a high upfront cost to acquire them. For infrequent use, users can, instead, leverage the public cloud to mitigate the high acquisition cost. However, with the wide diversity of hardware instances (particularly GPU instances) available in public cloud, it becomes challenging for a user to make an appropriate choice from a cost/performance standpoint. In this work, we try to address this problem by (i) introducing a comprehensive distributed deep learning (DDL) profiler Stash, which determines the various execution stalls that DDL suffers from, and (ii) using Stash to extensively characterize various public cloud GPU instances by running popular DNN models on them. Specifically, it estimates two types of communication stalls, namely, interconnect and network stalls, that play a dominant role in DDL execution time. Stash is implemented on top of prior work, DS-analyzer, that computes only the CPU and disk stalls. Using our detailed stall characterization, we list the advantages and shortcomings of public cloud GPU instances for users to help them make an informed decision(s). Our characterization results indicate that the more expensive GPU instances may not be the most performant for all DNN models and that AWS can sometimes sub-optimally allocate hardware interconnect resources. Specifically, the intra-machine interconnect can introduce communication overheads of up to 90% of DNN training time and the network-connected instances can suffer from up to 5× slowdown compared to training on a single instance. Furthermore, (iii) we also model the impact of DNN macroscopic features such as the number of layers and the number of gradients on communication stalls, and finally, (iv) we briefly discuss a cost comparison with existing work.
more » « less
Full Text Available
Stash: A comprehensive stall-centric characterization of public cloud VMs for distributed deep learning

Sharma, Aakash; Bhasi, Vivek; Singh, Sonali; Jain, Rishabh; Raj, Jashwant; Mitra, Subrata; Kandemir, Mahmut Taylan; Kesidis, George; Das, Chita (January 2023, Proceedings of the International Conference on Distributed Computing Systems)

Deep neural networks (DNNs) are increasingly popular owing to their ability to solve complex problems such as image recognition, autonomous driving, and natural language processing. Their growing complexity coupled with the use of larger volumes of training data (to achieve acceptable accuracy) has warranted the use of GPUs and other accelerators. Such accelerators are typically expensive, with users having to pay a high upfront cost to acquire them. For infrequent use, users can, instead, leverage the public cloud to mitigate the high acquisition cost. However, with the wide diversity of hardware instances (particularly GPU instances) available in public cloud, it becomes challenging for a user to make an appropriate choice from a cost/performance standpoint. In this work, we try to address this problem by (i) introducing a comprehensive distributed deep learning (DDL) profiler Stash, which determines the various execution stalls that DDL suffers from, and (ii) using Stash to extensively characterize various public cloud GPU instances by running popular DNN models on them. Specifically, it estimates two types of communication stalls, namely, interconnect and network stalls, that play a dominant role in DDL execution time. Stash is implemented on top of prior work, DS-analyzer, that computes only the CPU and disk stalls. Using our detailed stall characterization, we list the advantages and shortcomings of public cloud GPU instances for users to help them make an informed decision(s). Our characterization results indicate that the more expensive GPU instances may not be the most performant for all DNN models and that AWS can sometimes sub-optimally allocate hardware interconnect resources. Specifically, the intra-machine interconnect can introduce communication overheads of up to 90% of DNN training time and the network-connected instances can suffer from up to 5× slowdown compared to training on a single instance. Furthermore, (iii) we also model the impact of DNN macroscopic features such as the number of layers and the number of gradients on communication stalls, and finally, (iv) we briefly discuss a cost comparison with existing work.
more » « less
Full Text Available
Skipper: Enabling efficient SNN training through activation-checkpointing and time-skipping

https://doi.org/10.1109/MICRO56248.2022.00047

Singh, Sonali; Sarma, Anup; Lu, Sen; Sengupta, Abhronil; Kandemir, Mahmut T.; Neftci, Emre; Narayanan, Vijaykrishnan; Das, Chita R. (October 2022, 55th IEEE/ACM International Symposium on Microarchitecture (MICRO))

Full Text Available
Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

Sarma, Anup; Singh, Sonali; Jiang, Huaipan; Zhang, Rui; Kandemir, Mahmut; Das, Chita. (January 2022, Advances in neural information processing systems)

Full Text Available
Skipper: Enabling efficient SNN training through activation-checkpointing and time-skipping

Singh, Sonali; Sarma, Anup; Lu, Sen; Sengupta, Abhronil; Kandemir, Mahmut T.; Neftci, Emre; Narayanan, Vijaykrishnan; Das, Chita R (January 2022, Proceedings of the annual International Symposium on Microarchitecture)

Full Text Available
Gesture-SNN: Co-optimizing accuracy, latency and energy of SNNs for neuromorphic vision sensors

https://doi.org/10.1109/ISLPED52811.2021.9502506

Singh, Sonali; Sarma, Anup; Lu, Sen; Sengupta, Abhronil; Narayanan, Vijaykrishnan; Das, Chita R. (August 2021, 2021 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED))

Full Text Available
Exploiting Activation based Gradient Output Sparsity to Accelerate Backpropagation in CNNs

Sarma, Anup; Singh, Sonali; Jiang, Huaipan; Pattnaik, Ashutosh; Mishra, Asit K.; Narayanan, Vijaykrishnan; Kandemir, Mahmut T.; Das, Chita R. (September 2021, ArXivorg)

Full Text Available
NEBULA: A Neuromorphic Spin-Based Ultra-Low Power Architecture for SNNs and ANNs

https://doi.org/10.1109/ISCA45697.2020.00039

Singh, Sonali; Sarma, Anup; Jao, Nicholas; Pattnaik, Ashutosh; Lu, Sen; Yang, Kezhou; Sengupta, Abhronil; Narayanan, Vijaykrishnan; Das, Chita R. (May 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA))
null (Ed.)
Brain-inspired cognitive computing has so far followed two major approaches - one uses multi-layered artificial neural networks (ANNs) to perform pattern-recognition-related tasks, whereas the other uses spiking neural networks (SNNs) to emulate biological neurons in an attempt to be as efficient and fault-tolerant as the brain. While there has been considerable progress in the former area due to a combination of effective training algorithms and acceleration platforms, the latter is still in its infancy due to the lack of both. SNNs have a distinct advantage over their ANN counterparts in that they are capable of operating in an event-driven manner, thus consuming very low power. Several recent efforts have proposed various SNN hardware design alternatives, however, these designs still incur considerable energy overheads.In this context, this paper proposes a comprehensive design spanning across the device, circuit, architecture and algorithm levels to build an ultra low-power architecture for SNN and ANN inference. For this, we use spintronics-based magnetic tunnel junction (MTJ) devices that have been shown to function as both neuro-synaptic crossbars as well as thresholding neurons and can operate at ultra low voltage and current levels. Using this MTJ-based neuron model and synaptic connections, we design a low power chip that has the flexibility to be deployed for inference of SNNs, ANNs as well as a combination of SNN-ANN hybrid networks - a distinct advantage compared to prior works. We demonstrate the competitive performance and energy efficiency of the SNNs as well as hybrid models on a suite of workloads. Our evaluations show that the proposed design, NEBULA, is up to 7.9× more energy efficient than a state-of-the-art design, ISAAC, in the ANN mode. In the SNN mode, our design is about 45× more energy-efficient than a contemporary SNN architecture, INXS. Power comparison between NEBULA ANN and SNN modes indicates that the latter is at least 6.25× more power-efficient for the observed benchmarks.
more » « less
Full Text Available

Search for: All records